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Abstract: We consider the standard non-parametric regression model with 
Gaussian errors but where the data consist of different samples. The ques- 
tion to be answered is whether the samples can be adequately represented 
by the same regression function. To do this we define for each sample a 
universal, honest and non-asymptotic confidence region for the regression 
function. Any subset of the samples can be represented by the same func- 
tion if and only if the intersection of the corresponding confidence regions 
is non-empty. If the empirical supports of the samples are disjoint then the 
intersection of the confidence regions is always non-empty and a negative 
answer can only be obtained by placing shape or quantitative smoothness 
conditions on the joint approximation. Alternatively a simplest joint ap- 
proximation function can be calculated which gives a measure of the cost 
of the joint approximation, for example, the number of extra peaks required. 
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1. Introduction 

We consider the following problem in non-parametric regression. Given k sam- 
ples 

Vin, = {{U],y^]) : j = 1,- ■ -.ni}, i=l,...,k, (1) 

with supports 

Sim = {til <ti2 < ... < iinj, i = 1, . . . , fc, (2) 

the question to be answered is whether they can be simultaneously represented 
by a common function /. The standard approach is to assume that the data sets 
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were generated according to the model 

Y„,At) = hit) + <y^z^{t), ^ = 1, . . . , fc, t e [0, 1], (3) 

and then to consider the null and alternative hypotheses 

Ho : fi = . . . = fk Hi: fi^ fj for some i, j. (4) 

We assume that the noise processes Zi{t),i — l,...,k are independent and 
standard Gaussian white noise. Individual samples generated under ^ will be 
denoted by 

'Y irti — { i^ij : ^ij ) ■ j — 1 ; ■ ■ ■ ; ^z} ; ^ — 1 , . . . , fc. 

Here and in the following we use minuscule letters to denote general data sets 
and majuscule letters for data generated under We shall mostly restrict 
attention to the case k = 2; the extension to more samples poses no problems. 

Within this setup it is possible to construct tests which are asymptotically 
consistent if limn.; = oo, i = 1,2, and which can detect alternatives converging 
to the null hypothesis at certain rates. This may be formalized by putting 

/iW-/2(i) = /i,«i(i)-/2,n,(<) = A„(<), n = min(ni,n2) (5) 

where A„ is a difference function and measures the rate of convergence to the 



null hypothesis. The best result seems to be that of Neumeyer and Dette (2003 1 
who construct a test which can detect alternatives which converge to the null 
hypothesis at the optimal rate A„ = 0(n~^/^). If the supports are equal, Sim — 
{ti, . . . ,tni}, i = 1,2, then it is not difficult to construct such a test as the 
differences Yimitj) — F2n2(^i) '^^ ^'^^ depend on / (see for example Delgado 



( |1992| ) and [Fan and Lin| ( |1998[ )). The result of [Neumeyer and Dette| ( |2003 1 
continues to hold even if the supports are disjoint, ^i^^ fl 5*2^2 = 0- In this case, 
however, there are difficulties which can be most clearly seen in the case of exact 
data 

If we denote the supremum norm on [0, 1] by || • ||oo then the null and alternative 
hypotheses of Q may be rewritten as 

i?o:||/i-/2||oo=0, i?i : II/1-/2II00 >0. (6) 

If the values of /i and /2 are known only on disjoint sets Sin^ E^nd 5*2112 '^6" 
spectively, then it is not possible to decide between Hq and Hi. This continues 
to hold even if fi and /2 are subject to qualitative smoothness conditions such 
as infinite differentiability: all one does is to interpolate the data points using 
such a function. The addition of noise and the use of asymptotics does not solve 
the problem as indicated by Figure [l] The top panel shows two data sets of 
sizes ni = 7i2 = 500 generated according to Yi{t) = exp(1.5i) + 0.25Zi(i) and 
Y2{t) = exp(1.5t) + 3 + 0.25Z2(t) with disjoint supports taken to be i.i.d. uniform 
random variables on [0, 1]. The centre panel shows a joint piecewise constant 
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approximating function with 514 local extreme values. It can be made infinitely 
difl^erentiable by convolving it with a Gaussian kernel with a small bandwidth. 
The bottom panel shows a sample of size n = 1000 generated using the function 
of the centre panel. It looks very much like the two original data sets. 

In order to distinguish between Hq and Hi it is necessary to place either 
quantitative conditions on /i and /2 such as ||/;^'^'' ||oo ^ 1, ll/i^^lloo < 1, or shape 
restrictions such as /i and /2 being monotone. In spite of this all conditions 
imposed in the literature are of a qualitative form: Hall and Hart (19901, a 
botinded first derivative; Hardle and Marron (19901, Holder continuity; 'King 



et al. 


(19901 


Wang 


(1997 



, , Holder 

continuity of order j3 > 1/2; Dette and Neumeyer ( [2001 '), a continuous rth 



derivative: Lavergne (2001), a second derivative which is uniformly Lipschitz of 



order /9, < /3 < 1; Neumeyer and Dette (2003 1, continuous derivatives of order 
d > 2. The problem is one of uniform convergence which is required to make 
the results applicable for finite n and which does not follow from qualitative 
conditions alone. What can be said is that if the functions differ, then any 
joint approximation will become more complicated as the sample sizes increase. 
It is this increase in complexity which we call the cost of the simultaneous 
approximation. This is shown in Figure 1 where the individual approximations 
are monotone (top panel) but the simplest joint approximation has 514 local 
extreme values (centre panel). In the remainder of the paper we show how the 
quantification can be carried out. Our approach can be split into two parts: 

(1) Firstly, for each sample t/j„. we define a so called approximation region 
Aim which specifies those functions fi for which the model ([3| is an ade- 
quate approximation for the sample. The intersection of the approximation 
regions Ai,ni H -42, n, contains all those functions which simultaneously 
approximate both samples. It is also the approximation region for the si- 
multaneous approximation. A similar idea in the context of the one-way 



(2) 



table in the analysis of variance is expounded in Davies ( 2004 1 . 
Secondly, using some measure of complexity we regularize within each 
approximation region by choosing the simplest function which is consistent 



with the data. This is in the spirit of Donoho ( 1988 1 who pointed out that 



in non-parametric regression and density problems it is possible only to 
give lower bounds on certain quantities of interest such the number of 
modal values. 

In Section |2] we define the approximation or confidence regions Aim ^^'^ 
in Section [3] we apply the ideas and concepts to the problem of comparing 
regression functions. 
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2. Approximation regions 
2.1. Single samples 



The following is based on Davies et al. ( 2008c I . We consider a single sample of 
data Yn = {ti,Y{ti))i generated under the model 

Yit)^m + aZit) (7) 

where we take the ti to be ordered. Based on this model we consider two different 
approximation or confidence regions An and defined as follows. For any 
function g and any interval / C [0, 1] we put 



(8) 



where |/| denotes the number of points ti E I. The confidence region An is 
defined by 

yl„(Y„,X„, cr, T„) = {.g : max < cr^T„log(n) }. (9) 

where I„ is a collection of intervals of [0, 1]. We restrict attention to the cases 
where X„ is either the set of all intervals or a set of intervals of the form 

J„(A) = : lij,k) = [{j - 1)A'= + IJ, 

uij, k) = min{ UA*=J , n}, J = 1, . . . , [nA-'^l , A: = 1, . . . , [log n/ log A] XlO) 

for some A > 1. Our default choice is the (wavelet) dyadic scheme I„(2). For 
any given a and collection of intervals X„ we define Tn{a) by 



P max 



" V Ml ' ^eI 



a. 



(11) 



The value of r„(a) may be determined by simulations. These show that for X„ = 
I„(2) we have t„(0.95) < 3 for all n > 500. If X„ contains all singletons {ti}, 
as will always be the case, it follows from Diimbgen and Spokoiny ( 2001[ ) and 
|Kabluchko ( 2007[ ) that lim„_+oo Tn{a) = 2 for any a. One immediate consequence 
of ^ is 

Pif eAn{Y„,In,a,T„ia))) = a (12) 

so that An is a universal, exact and non-asymptotic confidence region for / of 
size a. 

The confidence region (|9| treats all intervals equally. The second confidence 
region A^ downweights the importance of small intervals and is defined as fol- 
lows. Diimbgen and Spokoinyj ( |2001 1 extended Levy's uniform modulus of con- 
tinuity of the Brownian motion and showed that 



sup 

0<s<t<l 



(B(«KB(.))- _21og(l/(,_,)) 

log(log(eV(t-s))) 



< oo a.s. 



(13) 
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If we embed the partial sums X^ie/ ■^(^i)/'\/PT ) / € X„, in a standard Brownian 
motion it follows that 

(Et,e/^(^.))V|/|-21og(n/|/|) 

sup — - — - — ,1 ^1,,, = r < oo a.s.. (141 

7ef„ log(log(e-n/|/|))) ^ ' 

This implies that for any a we can find a 7„ = 7„(a) such that 

^;(F„,X„,a,7«(a)) = {3: \w{g,Y,,,I)\ (15) 
< a^2\og{n/\I\) + 7„(a) log(log(e-n/|/|)) for all / e X„) }. 

is a universal, exact and non-asymptotic a-confidence region for /. The values 
of 7„ may be determined by simulation. For a = 0.95 and with X„ ~ In(2) a 
good approximation for 7„(a) for n > 100 is given by 

7„(0.95) « 5.77 - exp(2.89 - 0.61og(n)). (16) 

The confidence regions ^„(1'„,I„, cr, r„) and ^* (y„,X„, cr, 7„) both require 
the true value of a. We indicate how this may be obtained from the data in such 
a manner that the confidence region now becomes honest (Li (1989 1 ) rather 
than exact. The following argument makes the somewhat casual remarks on the 
problem made in Davies et al. ( |2008c I more precise. Using the normal approxi- 
mation for the binomial (n, 1/2) distribution it follows that for an i.i.d. sample 
(Wi, . . . , Wn) with common continuous distribution P with median med(P) 

J'(W'(r«/2+.«y^/2l)>nied(P))=/3 

where denotes the ith order statistic of the sample and 2^ the /3-quantile 
of the standard normal distribution. On putting f3 — 0.995 we obtain 

J'(l^(rn/2+i.288y?ll) > med(P)) = 0.995. 
We now apply this to the [n/2j random variables 

V, = \Y2,-i-Y2^l ^=1,...,L«/2J. 
It follows from Anderson] ( |1955[ ) that whatever the value of the function / 

P{V^ > 1.4826ct/\/2) > 1/2 

and consequently 

P{y([n/i+o.9W8V^]) > 1.4826^/^2) = 0.995. 
On using the corresponding result for the [(ri — 1)/2J random variables 

\Y2^-Y2^+ll I = 1 ,...,[(«- 1 ) /2J 
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it follows that if we define (T„ to be the \n/2 + 1.814y^] order statistic of the 
random variables 



1.4826 



\Y{t2)-Y{h)l.. 



1.4826 



then 



^/2 

-P(o-„ > 0-) > 0.99 



|r(i„)-r(i„_i)| 



for all n > 100 say whatever the value of /. It follows that y^„(l^„,X„, (T„, r„) 
and An(Ym^m ^mln) are now universal and non-asymptotic honest confidence 
regions whatever the value of / but with a replaced by a — 0.01, 



P(/ e An{Y,,an,(T„,Tn{a))) >a- 0.01. 



(17) 



with the corresponding inequality for A^- In spite of this the default value for 
i7„ we shall use in this paper is 

a„ = median (|r(i2) - . . . , |y(t„) - r(t„-i)|). (18) 



It is simpler, the difference is in general small, it was used in Davies and Kovac 



(20011, Davies et al. (20061, Davies et al. (2008a I and it also corresponds to 



using the first order Haar wavelets to estimate a (Donoho et al. (1995l). 

In Davies ( 1995 1 implicit use is made of an confidence region based on the 
lengths of runs of the signs of the residuals. Explicit universal, honest and non- 
asymptotic confidence regions based on the signs of the residuals are to be found 



Diimbgen ( 


1998 2003 


2007 


) and 


Diimbgen and Johns 


(2004 



2.2. A one-way table for regression functions 



This section extends the approach given in 'Davies ( 2004 1 for the one-way table 
to the case of regression functions. We consider k samples = {tij,Yi{tij))^^^ 



generated under (|3|. As a first step we replace the a in (111 and (15 1 by Uk 
a^l^ where k is the number of samples. This adjusts the size of each confidence 
region to take into account the number of samples. The confidence region for 
the ith sample is given by 

Aifi^ Aiji^ (i^^^i^ , 1 ^iui 1 '^iui {.^k) ) (-^^) 

{g : ^max \w{g ,Y I)\ < a.^ V ("fc) log(n,) } . 



We denote by Pj with / = (fi, . . . , fk) the probability model where all the 
samples Yi^i, i — 1, . . . , fc, are independently distributed and Yin- was gener 
ated under with f — f^^ i — I 
that 



, fc. It follows from the choice ak = a^^^ 



P {fi e Ain,{Yini,^in,,^ini,nn,{'^k)), i ^ ■ -k) > a for all /. (20) 
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All questions concerning the relationships between the functions can now be 
answered by using the confidence regions Aim ■ For example, the question as to 
whether the fi are all equal translates into the question as to whether 

(21) 

is empty or not. If the supports Sim of the samples are not disjoint then it 
is possible that the linear inequalities which define the confidence regions are 
inconsistent. In this case Anu = ^ ^^id there is no joint approximating function. 
If the supports Sim ^f the samples are pairwise disjoint then An,, is non- 
empty and so there always is a joint approximation function. Without further 
restrictions on the joint approximating function nothing more can be said. If 
however the joint approximating function is required to satisfy, for example, a 
shape constraint such as monotonicity, then it may be the case that there is no 
joint approximating function. Figure [l] shows just such a case where there are 
monotonic approximations for each sample individually but no monotonic joint 
approximation. To answer questions of this nature we must regularize within 
Arik ^'^'^ ^^is is topic of the next section. 

3. Regular ization 
3.1. Disjoint supports 

We consider firstly the case when the supports Sim i * = 1, . . . , fc, are pairwise 
disjoint. In this case the joint approximation region An^ is non-empty and will 
in general include many functions which would not be regarded as being ac- 
ceptable. Indeed, it may be that Auk does not contain any acceptable function. 
The definition of 'acceptable' will usually be formulated in terms of shape or 
quantitative smoothness constraints. 

Alternatively, rather than impose prior restrictions, one can determine a sim- 
plest function in the joint approximating region. One possibility is to minimize 
the number of local extreme points of a function g subject to g € An^ ■ Figure 
[T] shows an example of this approach where the joint approximating function 
has 514 internal local extreme points compared with monotone approximating 
functions for both data sets separately. The additional local extreme points can 
be regarded as the cost of the joint approximation. The same idea can be used 
if simplicity is defined in terms of smoothness, for example by minimizing the 
total variation TV{g^'^^) of the second derivative subject to g lying in the ap- 
proximation region. The upper panel of Figure [2] shows the data and curves of 
Figure [T] but with the values of the second sample reduced by an amount 2.3. 
There is now a joint monotone approximation which is shown in the lower panel 
of Figure [l] so there is no cost in terms of the number of local extreme values. If 
we minimize the total variation of the second derivative subject to the function 
being an adequate monotone approximation then there is a cost. The upper 
panel of Figure [3] shows the approximations of the two individual samples which 
minimize the total variation of the second derivatives subject to the functions 
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being monotone. The lower panel of Figure |3] shows the joint approximation for 
the combined sample. The second derivatives are shown in Figure |4] The values 
of the total variation of the second derivative are are he values are 9.317 and 
6.305 for the individual samples and 59.496 for the joint sample. 



3.2. Intersecting supports 



As mentioned in Section [T] the Neumeyer and Dette (2003) procedure can de- 
tect differences of the order of n~^/^. We now consider the size of detectable 
differences for our procedure in the case of equal supports. For simplicity we 
consider only the case k — 2 and assume that the supports 5*1 „j and 5'2n2 are 
given by tu = t2i = i/n. We take X„ to be the set of all intervals but indicate 
below the adjustments required if I„ = I„(A}as in (10 1. We state the results 
using (Ti and (T2 rather than the estimates (18 1 and write t„ = T„(a^/'^). If a 
joint approximating function /„ exists then for any interval / of [0, 1] we have 



< ai y^Tnlogin) , j = 1, 2. 



and hence 



< {(7i + CT2) V'r„log(n) 



For the noise we have with probability a 
^ Y,{Z,{U)-Z2{U)) 



< (CTI + (T2) V 'Tn log(n) 



and hence with probability a 
1 



Y.^h{U)-f2{U)) 



< 2(cti + (72) a/ r„ log(n) 



Suppose now that fi and /2 differ by an amount ?/„ on an interval /„ C [0, 1], 
that is fi{t) — f2{t) > rimt d /„ and that the length of /„ is (5„. As /„ contains 
about n5n support points we see that 



1 



-ndnTln < 2(0-1 + 0'2)v/r„log(n) 



which implies that no joint approximation will exist if 



y/S^iijn >2{ai+ CT2)v/r„log(n)/n. (22) 

It follows that with probability of at least a, deviations satisfying the latter 
inequality will be detected. If In ~ In (A) as in (10 1 it follows that there exists 
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an interval I'^ C / in X„(A) and of length |/^| > |/„|/A — (5„/A for which 
/i (i) ^ /2 [t] > Vn, t £ I'„ . This requires replacing (p2| by 



v/^?7„ > 2VA(cri + cr2)\/T„log(n)/n. (23) 

We consider a situation similar to that of Figure [T] as is shown in Figure [5] 
The sample sizes are n — 500 with common supports tj = j /n and we take 
a to be 0.95 so that au = 0.95^/^ = 0.9747. For this choice of a and with 
In = Tti(2) simulations give t„ = 2.973. We set fi{t) = exp(1.5t) and put 
f2{t) = h{t) except for t e [0.402,0.44] where /2(i) = /i(i) + ?7„. For this 
interval (5„ = 20/500 and so we expect to be able to detect deviations ?7„ of the 

order 

r;„ = 2\/2(0.25 + 0.25)^/2.973 log 500 /\/20 = 1.359 (24) 

with probability of at least 0.95. For the data shown in Figure [5] the difference 
is detected with rjn — 0.575 but not with rj^ — 0.574. 



If we put (5„ = 1 in ( 22 ) so that the two functions deviate over the whole 
interval then 

rjn > 2\/2(cti + cr2)\/T„log(n)/n. (25) 

which implies that deviations of order y^^\og(nY/n can be detected. 

The same analysis can be carried through using the approximation region 



An ■ Corresponding to ( 22 ) we obtain 

VnhiVS^) > 2(cti + a2)/Vn (26) 

where 

^' 2*log(l/(5)+7„(a)loglog(eV'5) ' " ' ^ ' 

is monotonically increasing and "fnict) is bounded in n. In particular, if (5„ = 1, 
then deviations of order l/y/n can be detected. 



3.3. Adapting the taut string algorithm 



The taut string algorithm of Davies and Kovac (2001) has proved to be very 



effective in determining the number of local extremes of a function contaminated 
by noise (see Davies et al. ( 2008b[ )). We show how it may be adapted to the case 
of k samples. Let n < X]i=i '^i denote the number of different t.^ values which 
we order as < ti < . . . < i„ < 1 . For each sample we calculate the values of 
o-zn, given by (llS]). We put 



Vn 



m ■ 



. ,n. 



(28) 



As a first step we check whether a joint approximating function exists. We 
do this by putting fn{tm) = Vm and then determining whether /„ S Auk- If 
this is not the case we conclude that no joint approximation exists. If a joint 
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approximation exists we put — X^t =t ^/^Ini ^^^^ calculate the partial 
sums y'^ — X™ '^jVj and = X™ Ej for to = 1, . . . , n with y'^ = Aq = 0. The 
initial lower and upper bounds and Ui are set to be = — D, [/^ = + -D 
where Z) is chosen so large that the straight line joining (0, 0) and (j/° , An) lies in 
the tube. For a given tube, the taut string through the tube and constrained to 
pass through (0,0) and (?/°, A„) is calculated. The value of the estimate fn{tm) 
at the point tm is taken to be the left hand derivative of the taut string except 
for the first point where the right-hand derivative is taken. For each data set 
individually it is now checked whether /„ G Aim ~ 1, ■ • ■ , fc- If this is the case 
the procedure ends. Otherwise those intervals for which the inequalities defining 
the Aim do not hold are noted and the tube is squeezed at all points and tj 
for which tj lies in such an interval. This is continued until a function /„ G Arik 
is found. 



4. Comparison with other procedures 
4-1. Analysis and simulations 

As the approach developed in this paper is not comparable with others when 
the supports are disjoint, we restrict attention to the case of equal supports. 
For simplicity we take k ~ 2. For such data Delgado (1992) proposed the test 
statistic 



max |i?(j)|/< = max 



/{anVn) (29) 



where cr„ is some quantifier of the noise. Under the null hypothesis /i = /2 = / 
the distribution of r„ does not depend on /. In this special case the test statistic 



of Neumeyer and Dette also reduces to (29 1. If the data were generated under 
([3| then under Hq the distribution of T„ converges to that of maxo<t<i |i3(i)| 
where i? is a standard Brownian motion. The 0.95-quantile is approximately 
2.24 which leads to rejection of Hq if 



Tr, > 2.24. 



(30) 



Suppose now that the data are generated as in ([3| with fi{t) ~ f2{t) apart from 
t in an interval / of length (5„ where /i (i) — /2 (^ ) > Vn- It follows from ( 30 1 that 
Hq will be rejected with high probability if 



SnVn > 4.48(t/\/^ 



(31) 



where — ct^ + cr|. If (5„ — 1 deviations of the order of a/y/n can be picked 
up which contrasts with the 0{a-y^log{n)/n) of (25). If however (5„ = 1/^/n it 



follows from (31 1 that the test statistic T„ will pick up deviations of the order 



of a. It follows from (22) and (26) that the methods based on An and An will 
both pick up deviations of the order of a y^iog(n)7y^. 
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Another test which is apphcable in this situation is due to Fan and Lin (1998). 
If we denote the Fourier transform of the data sets by Yi{i) and 12(2), i = 
1, . . . , n, and order them as described in Fan and Lin (1998), their test statistic 
reduces to 

1 



T* 



max 

l<m<n 



i=l 



((y2(^)-rl(^))Va,t-l) 



(32) 



where cr„ is some estimate of the standard deviation of the Y2{i) — Yi{{). For 
data generated under the model (|3| the critical value of T* can be obtained by 
simulations. It is not as simple to determine the size of the deviations which 
can be detected by the test ([32]) as the test statistic is a function of the Fourier 
transforms and the differences in the functions must be translated into differ- 
ences in the Fourier transforms. The first member of the sum in (32 1 is the 
difference of the means and this is given the largest weight . We do not pursue 
this further but give the results of a small simulation study. 
We put n — 500 and consider two samples of the form 



Yi{i/n) = Zi{i/n), i = 1, . . . , n = 500 
Y2{i/n) = g{i / n) + Z2{i / n) , i^l,...,n 



500 



(33) 
(34) 



were generated where the Zj{i/n) are i.i.d N{Q, 1) random variables with g given 
by one of 



(1) gi{t)^i^,Q<t<l, 



(3) 53 (i) 



0, < t < [/, 

77, C/<t<C/ + l/4, 

0, [/+l/4<t<l, 



(2) 92{t) = 



(4) 54 (t) 





< t < 1/2, 


-f]. 


1/2 < i < 1, 


0, 


0<t<U, 


V, 


U <t< U+l/8, 


-V, 


U +1/8 <t <U 


0, 


C/+ 1/4 < < < 1, 



(35) 

where U is uniformly distributed on [0, 3/4] and independent of the Zi,i — 1,2. 
The four procedures, Delgado-Neumeyer-Dette, Fan-Lin and those based on 
An and were all calibrated to give tests of size 0.05 for testing g = 0. 
The critical values for Delgado-Neumeyer-Dette and Fan-Lin tests are 2.22 
and 6.97 respectively. The value of t„ for the test based on An is 1.46 and the 
corresponding value of 7„ for that based on An is 0.66. Figure [6] shows the 
power of the tests for different values of 77. The upper panels are the results for 
g given by (351 (1) and ([35| (2) and the lower panels for g given by (351 (3) 



and (351 (4). The colour scheme is as follows: Delgado-Neumeyer-Dette blue, 
the Fan-Lin black. An green and An red. The results confirm the analysis given 
above. The Delgado-Neumeyer-Dette and Fan-Lin tests are better with g given 
by (1) but if the mean difference is zero (2), or the interval is small (3) or both 
(4) then they are outperformed by the procedure based on ^* and, in case 4, 
also by that based on An- 
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4-2. An application 

We give an example with some real data from the area of thin-film physics. They 
give the number of photons of refracted X-rays as a function of the angle of re- 
fraction and were kindly supplied by Professor Dieter Mergel of the University of 
Duisburg-Essen. Two such data sets are shown in the top panel of Figure [7] the 
differences yi{ti) — 2/2(^1) are shown in the bottom panel. Each data set is com- 
posed of 4806 measurements and the design points are the same. The samples 
differ in the manner in which the thin film was prepared. One of the questions to 
be answered is whether the results of the two methods are substantially different. 



The noise levels for the data sets are the same, namely 8.317, which is ex- 
plainable by the fact that the data are integer valued. The differences between 
the two data sets are concentrated on intervals each containing about 40 obser- 
vations. The estimate (31 1 suggests that the differences will have to be of the 
order of 92 to be detected with a degree of certainty by the Delgado-Neumeyer- 
Dette test. The actual differences are of about this order and in fact the test 
fails to reject the null hypothesis at the 0.1 level. The realized value of the test 
statistic is 1.734 as against the critical value of 1.90 given in (30 1. The Fan-Lin 
test (32 1 rejects the null hypothesis at the 0.01 level. The realized value of the 
test statistic is 111.66 as against the critical value of 12.44 for a test of size 
a ~ 0.01. Finally the tests based on An and A^ both reject the null hypothesis 
at the 0.01 level. The realized value of t„ is 43.15 as against the critical value 
of 1.50. The realized value of 7„ is 53.27 as against the critical value of 0.733. 



References 



Anderson, T. W. (1955). The integral of a symmetric unimodal fimction over 
a symmetric convex set and some probability inequalities. Proceedings of the 
American Mathematical Society, 6(2):170-176. 

Davies, P. L. (1995). Data features. Statistica Neerlandica, 49:185-245. 

Davies, P. L. (2004). The one-way table: In honour of John Tukey 1915-2000. 
Journal of Statistical Planning and Inference, 122:3-13. 

Davies, P. L., Gather, U., Meise, M., Mergel, D., and Mildenberger, T. (2008a). 
Residual based localization and quantification of peaks in x-ray diffrac- 
tograms. Annals of Applied Statistics, to appear. 

Davies, P. L., Gather, U., Nordman, D. J., and Weinert, H. (2008b). A compar- 
ison of automatic histogram constructions. EIMS: Probability and Statistics. 
to appear. 

Davies, P. L., Gather, U., and Weinert, H. (2006). Nonparametric regression as 
an example of model choice. Technical Report 24/06, Sonderforschungsbereich 
475, Fachbereich Statistik, University of Dortmund, Germany. 

Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multires- 
olution (with discussion). Annals of Statistics, 29(l):l-65. 

Davies, P. L., Kovac, A., and Meise, M. (2008c). Nonparametric regression, 
confidence regions and regularization. Annals of Statistics. To appear. 



Davies and Kovac/Quantifying the cost of simultaneous approximation 



13 



Delgado, M. A. (1992). Testing the equality of nonparametric regression curves. 

Statistics and Probability Letters, 17:199-204. 
Dette, H. and Neumeyer, N. (2001). Nonparametric analysis of covariance. 

Annals of Statistics, 29:1361-1400. 
Donoho, D. L. (1988). One-sided inference about functionals of a density. Annals 

of Statistics, 16:1390-1420. 
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., and Picard, D. (1995). 

Wavelet shrinkage: asymptopia? Journal of the Royal Statistical Society, 

57:371-394. 

Diimbgen, L. (1998). New goodness-of-fit tests and their application to non- 
parametric confidence sets. Annals of Statistics, 26:288 314. 

Diimbgen, L. (2003). Optimal confidence bands for shape-restricted curves. 
Bernoulli, 9(3):423-449. 

Diimbgen, L. (2007). Confidence bands for convex median curves using sign- 
tests. In Cator, E., Jongbloed, G., Kraaikamp, C., Lopuhaa, R., and Wellner, 
J., editors, Asymptotics: Particles, Processes and Inverse Problems, volume 55 
of IMS Lecture Notes - Monograph Series 55, pages 85-100. IMS, Hayward. 

Diimbgen, L. and Johns, R. (2004). Confidence bands for isotonic median curves 
using sign-tests. J. Comput. Graph. Statist., 13(2):519-533. 

Diimbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative 
hypotheses. Annals of Statistics, 29(1):124-152. 

Fan, J. and Lin, S. K. (1998). Test of significance when data are curves. Journal 
of American Statistical Association, 93:1007 1021. 

Hall, P. and Hart, D. H. (1990). Bootstrap test for difference between means 
in nonparametric regression. Journal of the American Statistical Association, 
85(412):1039-1049. 

Hardle, W. and Marron, J. S. (1990). Semiparametric comparison of regression 
curves. Annals of Statistics, 18(l):63-89. 

Kabluchko, Z. (2007). Extreme- value analysis of standardized Gaussian incre- 
ments. arXiv:0706.1849. 

King, E., Hart, J. D., and Wehrly, T. E. (1990). Testing the equality of two 
regression curves using linear smoothers. Statistics and Probability Letters, 
12:239-247. 

Kulasekera, K. B. (1995). Comparison of regression curves using quasi-residuals. 

Journal of the American Statistical Association, 90(431):1085 1093. 
Kulasekera, K. B. and Wang, J. (1997). Smoothing parameter selection for 

power optimality in testing of regression curves. Journal of the American 

Statistical Association, 92(438) :500-511. 
Lavergne, P. (2001). An equality test across nonparametric regressions. Journal 

of Econom.etncs, 103:307-344. 
Li, K.-C. (1989). Honsct confidence regions for nonparametric regression. Annals 

of Satistics, 17:1001-1008. 
Munk, A. and Dette, H. (1998). Nonparametric comparison of several regression 

functions: Exact and asymptotic theory. Ann. Stat., 26(6):2339 2368. 
Neumeyer, N. and Dette, H. (2003). Nonparametric comparison of regression 

curves - an empirical process approach. Annals of Statistics, 31:880-920. 



Davies and Kovac/ Quantifying the cost of simultaneous approximation 



14 





Fig 1. The top panel shows two samples each of size 500 generated by Yiii) = exp(1.5t) + 
0.25Z(t) and Y2{i) = exp(1.5t) + 3 + 0.25Z(t) together with the approximating monotonie 
curves. The design points were taken to be i.i.d. random variables uniformly distributed on 
[0, 1]. The centre panel shows a joint approximating function with 514 local extreme values. 
The bottom panel shows a sample of size n = 1000 generated using the function of the centre 
panel. 
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Fig 2. The upper panel shows the data of Figure [7] but with the values of Y2 now given by 
Y2{t) = exp(1.54) + 0.07 + 0.25Z{t). There is now a joint monotonia approximating function 
which is shown in the lower panel. 
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Fig 3. The upper panel shows the results of minimizing the total variation of the second 
derivative for the data of the upper panel of Figure [l] subject to monotonicity. The lower 
panel shows the corresponding result for the lower panel of Figure pi 
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Fig 4. The upper panel shows the second derivative iof the functions in the upper panel of 
Figure^ the lower panel shows the corresponding result for the lower panel of Figure [3| 
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Fig 5. The upper panel shows the function f\(i) = exp(1.5t) and the function f'z which is 
equal to f\ apart from the interval [0.402,0.440] where fiit) = + 0.575. The lower panel 
shows the two data sets Yi(tj} = fi{tj) + 0.25Zi{tj) and Y2(tj) = f2{tj) + 0.25Z2{tj) for 
j = 1,. . . , 500 and with tj = j/500. 
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Fig 6 . The top row shows the power functions of the four tests with g given by (1) and (2). 
The bottom row shows the power functions of the four tests with g given by (1) and (2), The 
Delgado-Neumeyer-Dette is shown in blue, the Fan-Lin test in black, the test based on An 
in green and that based on in red. 
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Fig 7. The top and center panels show two data sets each of 4806 observations with the same 
design points. The lower panel shows the differences of the two samples. 



